{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Chris Holden (ceholden@gmail.com) - https://github.com/ceholden\n",
    "\n",
    "Chapter 4: Importing and using vector data -- the OGR library\n",
    "=================================================="
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "The *OGR* library is a companion library to *GDAL* that handles vector data capabilities, including information queryies, file conversions, rasterization of polygon features, polygonization of raster features, and much more. It handles popular formats including the *ESRI Shapefile*, *Keyhole Markup Language*, *PostGIS*, and *SpatiaLite*. For more information on how *OGR* came about and how it relates to *GDAL*, see here: http://trac.osgeo.org/gdal/wiki/FAQGeneral#WhatisthisOGRstuff.\n",
    "\n",
    "The authors of *GDAL*/*OGR* provide a tutorial for the *OGR* library [here](http://www.gdal.org/ogr_apitut.html).\n",
    "\n",
    "> Note: As of 08/12/2014, the API used in this tutorial seems to be ahead of the current 1.11.0 release. Specifically, they demonstrate how to open a vector file using `gdal.OpenEx`, which is a change designed to unify the *GDAL* and *OGR* sections of the library.\n",
    "    \n",
    "> A clone of the *GDAL 1.9* API tutorial can be found [here](http://www.compsci.wm.edu/SciClone/documentation/software/geo/gdal-1.9.0/html/ogr/ogr_apitut.html)\n",
    "\n",
    "In this chapter we will use an *ESRI Shapefile* that contains training data I collected in QGIS for the example image we've been working on.\n",
    "\n",
    "## Opening an *ESRI Shapefile*\n",
    "\n",
    "Just like *GDAL*, *OGR* abstracts the file formats so that we can use the same code for any format. It employs the same concept of a *dataset* object which we can gather information from:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Import Python 3 print function\n",
    "from __future__ import print_function\n",
    "\n",
    "# Import OGR - \n",
    "from osgeo import ogr\n",
    "\n",
    "# Open the dataset from the file\n",
    "dataset = ogr.Open('../../example/training_data.shp')\n",
    "# Make sure the dataset exists -- it would be None if we couldn't open it\n",
    "if not dataset:\n",
    "    print('Error: could not open dataset')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With our Shapefile read in, we can look at some of its properties:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset driver is: ESRI Shapefile\n",
      "\n",
      "The shapefile has 1 layer(s)\n",
      "\n",
      "The layer is named: training_data\n",
      "\n",
      "The layer's geometry is: Polygon\n",
      "\n",
      "Layer projection is: +proj=utm +zone=15 +datum=WGS84 +units=m +no_defs \n",
      "\n",
      "Layer has 30 features\n",
      "\n",
      "Layer has 2 fields\n",
      "Their names are: \n",
      "\tid - Integer64\n",
      "\tclass - String\n"
     ]
    }
   ],
   "source": [
    "### Let's get the driver from this file\n",
    "driver = dataset.GetDriver()\n",
    "print('Dataset driver is: {n}\\n'.format(n=driver.name))\n",
    "\n",
    "### How many layers are contained in this Shapefile?\n",
    "layer_count = dataset.GetLayerCount()\n",
    "print('The shapefile has {n} layer(s)\\n'.format(n=layer_count))\n",
    "\n",
    "### What is the name of the 1 layer?\n",
    "layer = dataset.GetLayerByIndex(0)\n",
    "print('The layer is named: {n}\\n'.format(n=layer.GetName()))\n",
    "\n",
    "### What is the layer's geometry? is it a point? a polyline? a polygon?\n",
    "# First read in the geometry - but this is the enumerated type's value\n",
    "geometry = layer.GetGeomType()\n",
    "\n",
    "# So we need to translate it to the name of the enum\n",
    "geometry_name = ogr.GeometryTypeToName(geometry)\n",
    "print(\"The layer's geometry is: {geom}\\n\".format(geom=geometry_name))\n",
    "\n",
    "### What is the layer's projection?\n",
    "# Get the spatial reference\n",
    "spatial_ref = layer.GetSpatialRef()\n",
    "\n",
    "# Export this spatial reference to something we can read... like the Proj4\n",
    "proj4 = spatial_ref.ExportToProj4()\n",
    "print('Layer projection is: {proj4}\\n'.format(proj4=proj4))\n",
    "\n",
    "### How many features are in the layer?\n",
    "feature_count = layer.GetFeatureCount()\n",
    "print('Layer has {n} features\\n'.format(n=feature_count))\n",
    "\n",
    "### How many fields are in the shapefile, and what are their names?\n",
    "# First we need to capture the layer definition\n",
    "defn = layer.GetLayerDefn()\n",
    "\n",
    "# How many fields\n",
    "field_count = defn.GetFieldCount()\n",
    "print('Layer has {n} fields'.format(n=field_count))\n",
    "\n",
    "# What are their names?\n",
    "print('Their names are: ')\n",
    "for i in range(field_count):\n",
    "    field_defn = defn.GetFieldDefn(i)\n",
    "    print('\\t{name} - {datatype}'.format(name=field_defn.GetName(),\n",
    "                                         datatype=field_defn.GetTypeName()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The shapefile is already projected in the same projection that our example raster image is in, so we won't be needing to reproject it. You *could*, however, do so using either the [ogr2ogr](http://www.gdal.org/ogr2ogr.html) command line application, or by [reprojecting the shapefile in Python](http://pcjericks.github.io/py-gdalogr-cookbook/projection.html#reproject-a-layer).\n",
    "\n",
    "## Tie-in with our Raster dataset\n",
    "\n",
    "The training data we just opened contains two fields:\n",
    "+ an ID field (Integer datatype)\n",
    "+ a class field (String datatype)\n",
    "\n",
    "Combined with the innate location information of polygons in a Shapefile, fields resemble all that we need to use for pairing labels (i.e., the integer ID and the string description) with the information in our raster.\n",
    "\n",
    "However, in order to pair up our vector data with our raster pixels, we will need a way of co-aligning the datasets in space. \n",
    "\n",
    "One (complicated) way of doing this would be to manually loop through each polygon in our vector layer and determine which pixels from our raster are contained within. This approach is exactly what GIS softwares (e.g., ENVI, ArcGIS, QGIS) do when doing pairing rasters with vectors, like when doing zonal statistics.\n",
    "\n",
    "Another less complicated way would be to use the concept of a Region of Interest (ROI) image where each non-zero pixel value in our ROI image corresponds to a raster representation of a polygon from our vector layer. In the example of our training data, most of the values would be 0 in the rasterized representation because our training data samples are small compared to the entire study area. Because we have assigned an integer ID field to each polygon, we could use these integers to store information about which polygons belong to which pixels. In this case, I have assigned values ranging from 1 - 5 for the classes:\n",
    "\n",
    "+ 1 - forest\n",
    "+ 2 - water\n",
    "+ 3 - herbaceous\n",
    "+ 4 - barren\n",
    "+ 5 - urban\n",
    "\n",
    "To accomplish this rasterization of a vector layer, we could use the GDAL command line utility [gdal_rasterize](http://www.gdal.org/gdal_rasterize.html), but we can stick to pure Python by using the GDAL function [gdal.RasterizeLayer](http://gdal.org/gdal__alg_8h.html#adfe5e5d287d6c184aab03acbfa567cb1)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Command line version -- gdal_rasterize"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First thing we need to do is to figure out what the spatial extent and the pixel size of our output raster. To do this, I will use [gdalinfo](http://www.gdal.org/gdalinfo.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Driver: GTiff/GeoTIFF\n",
      "Files: ../../example/LE70220491999322EDC01_stack.gtif\n",
      "       ../../example/LE70220491999322EDC01_stack.gtif.aux.xml\n",
      "Size is 250, 250\n",
      "Coordinate System is:\n",
      "PROJCS[\"unnamed\",\n",
      "    PROJECTION[\"Transverse_Mercator\"],\n",
      "    PARAMETER[\"latitude_of_origin\",0],\n",
      "    PARAMETER[\"central_meridian\",-93],\n",
      "    PARAMETER[\"scale_factor\",0.9996],\n",
      "    PARAMETER[\"false_easting\",500000],\n",
      "    PARAMETER[\"false_northing\",0],\n",
      "    UNIT[\"metre\",1,\n",
      "        AUTHORITY[\"EPSG\",\"9001\"]]]\n",
      "PROJ.4 string is:\n",
      "'+proj=utm +zone=15 +ellps=WGS84 +units=m +no_defs '\n",
      "Origin = (462405.000000000000000,1741815.000000000000000)\n",
      "Pixel Size = (30.000000000000000,-30.000000000000000)\n",
      "Metadata:\n",
      "  AREA_OR_POINT=Area\n",
      "  Band_1=band 1 reflectance\n",
      "  Band_2=band 2 reflectance\n",
      "  Band_3=band 3 reflectance\n",
      "  Band_4=band 4 reflectance\n",
      "  Band_5=band 5 reflectance\n",
      "  Band_6=band 7 reflectance\n",
      "  Band_7=band 6 temperature\n",
      "  Band_8=Band 8\n",
      "Image Structure Metadata:\n",
      "  INTERLEAVE=PIXEL\n",
      "Corner Coordinates:\n",
      "Upper Left  (  462405.000, 1741815.000) \n",
      "Lower Left  (  462405.000, 1734315.000) \n",
      "Upper Right (  469905.000, 1741815.000) \n",
      "Lower Right (  469905.000, 1734315.000) \n",
      "Center      (  466155.000, 1738065.000) \n",
      "Band 1 Block=250x2 Type=Int16, ColorInterp=Gray\n",
      "  Description = band 1 reflectance\n",
      "  Min=198.000 Max=1810.000 \n",
      "  Minimum=198.000, Maximum=1810.000, Mean=439.016, StdDev=139.717\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=1810\n",
      "    STATISTICS_MEAN=439.015984\n",
      "    STATISTICS_MINIMUM=198\n",
      "    STATISTICS_STDDEV=139.7168287663\n",
      "Band 2 Block=250x2 Type=Int16, ColorInterp=Undefined\n",
      "  Description = band 2 reflectance\n",
      "  Min=315.000 Max=2294.000 \n",
      "  Minimum=315.000, Maximum=2294.000, Mean=661.543, StdDev=180.790\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=2294\n",
      "    STATISTICS_MEAN=661.54288\n",
      "    STATISTICS_MINIMUM=315\n",
      "    STATISTICS_STDDEV=180.78985343571\n",
      "Band 3 Block=250x2 Type=Int16, ColorInterp=Undefined\n",
      "  Description = band 3 reflectance\n",
      "  Min=160.000 Max=2820.000 \n",
      "  Minimum=160.000, Maximum=2820.000, Mean=589.380, StdDev=270.708\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=2820\n",
      "    STATISTICS_MEAN=589.379808\n",
      "    STATISTICS_MINIMUM=160\n",
      "    STATISTICS_STDDEV=270.70755024913\n",
      "Band 4 Block=250x2 Type=Int16, ColorInterp=Undefined\n",
      "  Description = band 4 reflectance\n",
      "  Min=1105.000 Max=5138.000 \n",
      "  Minimum=1105.000, Maximum=5138.000, Mean=3442.298, StdDev=461.059\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=5138\n",
      "    STATISTICS_MEAN=3442.297712\n",
      "    STATISTICS_MINIMUM=1105\n",
      "    STATISTICS_STDDEV=461.05944906873\n",
      "Band 5 Block=250x2 Type=Int16, ColorInterp=Undefined\n",
      "  Description = band 5 reflectance\n",
      "  Min=353.000 Max=4548.000 \n",
      "  Minimum=353.000, Maximum=4548.000, Mean=2181.929, StdDev=427.101\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=4548\n",
      "    STATISTICS_MEAN=2181.928672\n",
      "    STATISTICS_MINIMUM=353\n",
      "    STATISTICS_STDDEV=427.10099628111\n",
      "Band 6 Block=250x2 Type=Int16, ColorInterp=Undefined\n",
      "  Description = band 7 reflectance\n",
      "  Min=145.000 Max=3705.000 \n",
      "  Minimum=145.000, Maximum=3705.000, Mean=1049.994, StdDev=375.115\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=3705\n",
      "    STATISTICS_MEAN=1049.99384\n",
      "    STATISTICS_MINIMUM=145\n",
      "    STATISTICS_STDDEV=375.11543521702\n",
      "Band 7 Block=250x2 Type=Int16, ColorInterp=Undefined\n",
      "  Description = band 6 temperature\n",
      "  Min=2335.000 Max=3546.000 \n",
      "  Minimum=2335.000, Maximum=3546.000, Mean=2678.677, StdDev=158.668\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=3546\n",
      "    STATISTICS_MEAN=2678.677184\n",
      "    STATISTICS_MINIMUM=2335\n",
      "    STATISTICS_STDDEV=158.66755034924\n",
      "Band 8 Block=250x2 Type=Int16, ColorInterp=Undefined\n",
      "  Min=0.000 Max=0.000 \n",
      "  Minimum=0.000, Maximum=0.000, Mean=0.000, StdDev=0.000\n",
      "  NoData Value=-9999\n",
      "  Metadata:\n",
      "    STATISTICS_MAXIMUM=0\n",
      "    STATISTICS_MEAN=0\n",
      "    STATISTICS_MINIMUM=0\n",
      "    STATISTICS_STDDEV=0\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "# Remember -- \"%bash\" as seen above just indicates to the IPython notebook that I'm now writing in Bash\n",
    "\n",
    "# Print out metadata about raster -- we include \"-proj4\" to print out the Proj4 projection string\n",
    "gdalinfo -proj4 ../../example/LE70220491999322EDC01_stack.gtif"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wow - how informative!\n",
    "\n",
    "We will use the information about the Upper Left and Lower Right coordinates:\n",
    "\n",
    "> Upper Left  (  462405.000, 1741815.000) ( 93d21' 3.44\"W, 15d45'16.33\"N)\n",
    "\n",
    "> Lower Right (  469905.000, 1734315.000) ( 93d16'51.06\"W, 15d41'12.60\"N)\n",
    "\n",
    "This tells us that our Upper Left X/Y and Lower Right X/Y are \"462405, 1741815, 469905, 1734315\". We can also see that our Landsat pixels are 30x30m.\n",
    "\n",
    "The projection is UTM15N, and the projection string is `'+proj=utm +zone=15 +datum=WGS84 +units=m +no_defs '`\n",
    "\n",
    "We will need this information for `gdal_rasterize`. We can print the usage of `gdal_rasterize` as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Usage: gdal_rasterize [-b band]* [-i] [-at]\n",
      "       [-burn value]* | [-a attribute_name] [-3d] [-add]\n",
      "       [-l layername]* [-where expression] [-sql select_statement]\n",
      "       [-of format] [-a_srs srs_def] [-co \"NAME=VALUE\"]*\n",
      "       [-a_nodata value] [-init value]*\n",
      "       [-te xmin ymin xmax ymax] [-tr xres yres] [-tap] [-ts width height]\n",
      "       [-ot {Byte/Int16/UInt16/UInt32/Int32/Float32/Float64/\n",
      "             CInt16/CInt32/CFloat32/CFloat64}] [-q]\n",
      "       <src_datasource> <dst_filename>\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Missing source or destination.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "\n",
    "# Print out the usage\n",
    "gdal_rasterize --help"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For better descriptions, see the documentation page [here](http://www.gdal.org/gdal_rasterize.html).\n",
    "\n",
    "Now let's run the command -- note that we need to rearrange the Upper Left and Lower Right X/Y pairs to be in the \"xmin ymin xmax ymax\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0...10...20...30...40...50...60...70...80...90...100 - done.\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "\n",
    "# Explanation of switches:\n",
    "# -a ==> write values from the\"id\" attribute of the shapefile\n",
    "# -layer ==> the layer name of our shapefile\n",
    "# -of ==> Output raster file format\n",
    "# -a_srs ==> output spatial reference system string\n",
    "# -a_nodata ==> NODATA value for output raster\n",
    "# -te ==> target extent which matches the raster we want to create the ROI image for\n",
    "# -tr ==> target resolution, 30 x 30m\n",
    "# -ot Byte ==> Since we only have values 0 - 5, a Byte datatype is enough\n",
    "\n",
    "gdal_rasterize -a \"id\" \\\n",
    "    -l training_data \\\n",
    "    -of \"GTiff\" \\\n",
    "    -a_srs \"+proj=utm +zone=15 +datum=WGS84 +units=m +no_defs\" \\\n",
    "    -a_nodata 0 \\\n",
    "    -te 462405 1734315 469905 1741815 \\\n",
    "    -tr 30 30 \\\n",
    "    -ot Byte \\\n",
    "    ../../example/training_data.shp ../../example/training_data.gtif"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a lot of ways the command line version is easier than programming it using the Python bindings to GDAL's API. Continue on for an example using this second method:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Pure Python version -- gdal.RasterizeLayer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Success\n"
     ]
    }
   ],
   "source": [
    "# Import GDAL\n",
    "from osgeo import gdal\n",
    "\n",
    "# First we will open our raster image, to understand how we will want to rasterize our vector\n",
    "raster_ds = gdal.Open('../../example/LE70220491999322EDC01_stack.gtif', gdal.GA_ReadOnly)\n",
    "\n",
    "# Fetch number of rows and columns\n",
    "ncol = raster_ds.RasterXSize\n",
    "nrow = raster_ds.RasterYSize\n",
    "\n",
    "# Fetch projection and extent\n",
    "proj = raster_ds.GetProjectionRef()\n",
    "ext = raster_ds.GetGeoTransform()\n",
    "\n",
    "raster_ds = None\n",
    "\n",
    "# Create the raster dataset\n",
    "memory_driver = gdal.GetDriverByName('GTiff')\n",
    "out_raster_ds = memory_driver.Create('../../example/training_data.gtif', ncol, nrow, 1, gdal.GDT_Byte)\n",
    "\n",
    "# Set the ROI image's projection and extent to our input raster's projection and extent\n",
    "out_raster_ds.SetProjection(proj)\n",
    "out_raster_ds.SetGeoTransform(ext)\n",
    "\n",
    "# Fill our output band with the 0 blank, no class label, value\n",
    "b = out_raster_ds.GetRasterBand(1)\n",
    "b.Fill(0)\n",
    "\n",
    "# Rasterize the shapefile layer to our new dataset\n",
    "status = gdal.RasterizeLayer(out_raster_ds,  # output to our new dataset\n",
    "                             [1],  # output to our new dataset's first band\n",
    "                             layer,  # rasterize this layer\n",
    "                             None, None,  # don't worry about transformations since we're in same projection\n",
    "                             [0],  # burn value 0\n",
    "                             ['ALL_TOUCHED=TRUE',  # rasterize all pixels touched by polygons\n",
    "                              'ATTRIBUTE=id']  # put raster values according to the 'id' field values\n",
    "                             )\n",
    "\n",
    "# Close dataset\n",
    "out_raster_ds = None\n",
    "\n",
    "if status != 0:\n",
    "    print(\"I don't think it worked...\")\n",
    "else:\n",
    "    print(\"Success\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have **a** working method, we can check how many pixels of training data we collected:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check rasterized layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Class 0 contains 61400 pixels\n",
      "Class 1 contains 583 pixels\n",
      "Class 2 contains 24 pixels\n",
      "Class 3 contains 223 pixels\n",
      "Class 4 contains 173 pixels\n",
      "Class 5 contains 97 pixels\n"
     ]
    }
   ],
   "source": [
    "# Import NumPy for some statistics\n",
    "import numpy as np\n",
    "\n",
    "roi_ds = gdal.Open('../../example/training_data.gtif', gdal.GA_ReadOnly)\n",
    "\n",
    "roi = roi_ds.GetRasterBand(1).ReadAsArray()\n",
    "\n",
    "# How many pixels are in each class?\n",
    "classes = np.unique(roi)\n",
    "# Iterate over all class labels in the ROI image, printing out some information\n",
    "for c in classes:\n",
    "    print('Class {c} contains {n} pixels'.format(c=c,\n",
    "                                                 n=(roi == c).sum()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrapup\n",
    "\n",
    "Now that we have our ROI image, we can proceed to use it for pairing our labeled polygons with the matching pixels in our Landsat image to train a classifier for image classification. We continue this step in the next chapter (link to [webpage](chapter_5_classification.html) or [Notebook](chapter_5_classification.ipynb))."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
